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Abstract 

Pain assessment through observational pain 
scales is necessary for special categories of 
patients such as neonates, patients with de¬ 
mentia, critically ill patients, etc. The re¬ 
cently introduced Prkachin-Solomon score al¬ 
lows pain assessment directly from facial im¬ 
ages opening the path for multiple assistive 
applications. In this paper, we introduce 
the Histograms of Topographical (HoT) fea¬ 
tures, which are a generalization of the topo¬ 
graphical primal sketch, for the description 
of the face parts contributing to the men¬ 
tioned score. We propose a semi-supervised, 
clustering oriented self-taught learning pro¬ 
cedure developed on the emotion oriented 
Cohn-Kanade database. We use this proce¬ 
dure to improve the discrimination between 
different pain intensity levels and the gener¬ 
alization with respect to the monitored per¬ 
sons, while testing on the UNBC McMaster 
Shoulder Pain database. 


1. Introduction 

In the past, the calculator was a mere tool for easing 
math. The rapid progress in the computer science and 
in integrated micro-mechatronic, helped the appear¬ 
ance of assistive technologies. They can improve the 
quality of life for all disabled, patients and elderly, but 
also for healthy people. Assistive technologies include 
monitoring systems connected to an alarm system to 
help caregivers while managing the activities associ¬ 
ated with vulnerable people. Such an example is au¬ 
tomatic non-intrusive monitoring for pain assessment. 

The International Association for the Study of Pain 
defines pain as ”an unpleasant sensory and emotional 
experience associated with actual or potential tissue 
damage, or described in terms of such damage” (J. 
Boyd et ah, 2011). Assessment of pain was showed 
to be a critical factor for psychological comfort in the 
periods spent waiting at emergency units (Gawande, 
2004). Typically, the assessment is based primary on 
the self-report and several procedures are at hand; de¬ 
tails can be retrieved from (Hugueta et ah, 2010) and 
from the references therein. Complementary to the 
self-report, there are observational scales for pain as¬ 
sessment and a review may be followed in (von Baeyer 
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& Spagrud, 2007). If both methods are available, the 
self report should be the preferred choice (Shavit et ah, 
2008). 

Yet, there are several aspects that strongly motivate 
the necessity of the observational scales: (1) Adult pa¬ 
tients, typically, self-assess the pain intensity using a 
no-reference system, which leads to inconsistent prop¬ 
erties across scale, reactivity to suggestion, efforts at 
impressing unit personnel etc. (Hadjistavropoulos & 
Craig, 2004); (2) Patients with difficulties in commu¬ 
nication (e.g. newborns, patients with dementia, pa¬ 
tients critically ill) cannot self-report and assessment 
by specialized personnel is demanded (von Baeyer & 
Spagrud, 2007), (Haslam et ah, 2011); (3) Pain as¬ 
sessment by nurses encounters several difficulties. The 
third criteria is detailed by Manias et al. (Manias 
et ah, 2002) by naming four practical barriers emerged 
from thorough field observations: (a) nurses encounter 
interruptions while responding to activities relating to 
pain; (b) nurses’ attentiveness to the patient cues of 
pain vary due to other activities related to the patients; 

(c) nurses’ interpretations of pain vary with the inci¬ 
sional pain being the primary target of attention, and 

(d) nurses’ attempt to address competing demands of 
fellow nurses, doctors and patients. To respond to 
these aspects, automatic appraisal of pain by observa¬ 
tional scales is urged. 

Among the multiple observational scales existing to 
the moment, the revised Adult Nonverbal Pain Scale 
(ANPS-R) and the Critical Care Pain Observation 
Tool (CPOT) have been consistently found reliable 
(Stites, 2013), (Topolovec-Vranic et ah, 2013), (Chan- 
ques et ah, 2014). Both scales include evaluation of 
multiple factors, out of which the first is the dynamic 
of the face expression. Intense pain is marked by fre¬ 
quent grimace, tearing, frowning, wrinkled forehead 
(in ANPS-R) and, respectively, frowning, brow low¬ 
ering, orbit tightening, levator contraction and eyelid 
tightly closed (in CPOT). 

The mentioned facial dynamics, in fact, overlap some 
of the action units (AU) as they have been described 
by the seminal Facial Action Coding Systems (FACS) 
introduced by 2002. A practical formula to contribute 
to the overall pain intensity assessment from facial dy¬ 
namics is the Prkachin - Solomon formula (Prkachin & 
P. Solomon, 2008). Here, the pain is quantized in 16 
discrete levels (0 to 15) obtained from the quantization 
of the 6 contributing face AUs : 

Pair, - ^C^4 + max(AC/6,A[/7)-k 

“ max (ACg, AUw) + AC 43 ^ ^ 

The Prkachin - Solomon formula has the cogent merit 


of permitting direct appraisal of the pain intensity 
from digital face image sequences acquired by regu¬ 
lar video-cameras and image analysis. Thus, it clears 
the path for multiple applications in the assistive com¬ 
puter vision domain. For instance, in probably the 
most intuitive implementation (Ashraf et al., 2009), 
by means of digital recording, a patient is continuously 
monitored and when an expression of pain is detected, 
an alert signal triggers the nurse’s attention; he/she 
will further check the patient’s state and will consider 
measures for pain alleviation. Such a system may be 
employed in intensive care units, where its main pur¬ 
pose would be to reduce the workload and increase the 
efficiency of the nursing staff. Alternatively, it could be 
used for continuous monitoring of patients with com¬ 
munication disabilities (e.g. neonates) and reduce the 
cost for permanent caring. 

Following further developments (i.e. reaching high ac¬ 
curacy), in both computer vision and pain assessment 
and management, automatic systems that use the in¬ 
formation extracted from video sequences could be ap¬ 
plied to infer the pain intensity level and to automat¬ 
ically administer the palliative care. 

Another area of applicability is to monitor people per¬ 
forming physical exercises. For patients recovering 
from orthopedic procedures, such an application would 
permit near real-time identification of the movements 
causing pain, thus leading to more efficient adjust¬ 
ments of the recovering program. For athletes or for 
normal persons training, such an application would 
contribute to the identification of the weaker muscle 
groups and to fast improvement of the training pro¬ 
gram. 

In this paper we propose a system for face analysis and, 
more precisely, for pain intensity estimation, as mea¬ 
sured by the Prkachin-Solomon formula, from video 
sequences. We claim the following contributions^: (1) 
we introduce the Histogram of Topographical (HoT) 
features that are able to address variability in face im¬ 
ages; (2) in order to surmount the limited number of 
persons, a trait typical for the medical-oriented image 
databases, we propose a semi-supervised, clustering- 
oriented, self-taught learning procedure; (3) we pro¬ 
pose a machine learning based, temporal filtering to 
reduce the influence of the blinks and to increase the 
overall accuracy; (4) we propose a system for face dy¬ 
namic analysis that applied to pain intensity estima¬ 
tion leads to qualitative results. 

^This paper extends the work from 2014 by improving 
the transfer method, by supplementary and more intensive 
testing and by adding filtering of the temporal sequences 
and, thns boosting, the overall performance. 
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1.1. Prior Art 

Although other means of investigation (e.g. bio¬ 
medical signals) were discussed (Werner et ah, 2013), 
in the last period significant efforts have been made 
to identify reliable and valid facial indicators of pain, 
in an effort to develop non-invasive systems. Mainly, 
these are correlated with the appearance of three 
databases: the Classification of Pain Expressions 
(COPE) database (Brahnam et ah, 2007) which fo¬ 
cuses on infant classification of pain expressions, the 
Bio-Heat-Vid (Werner et ah, 2013) database contain¬ 
ing records of induced pain and the UNBC McMaster 
Pain Database (Lucey et ah, 2011) with adult sub¬ 
jects suffering from shoulder pain. As said in the in¬ 
troduction, the majority of the face-based pain esti¬ 
mation methods exploit the Action Unit (AU) face de¬ 
scription, previously used in emotion detection, and to 
which is correlated. A detailed review of the emotion 
detection methods is in the work of Zeng et al. (Zeng 
et ah, 2009) and, more recently, in the work of Cohn 
and De La Torre (Cohn & De la Torre, 2014). 

On the COPE database, 2007 exploited Discrete Co¬ 
sine Transform (DCT) for image description followed 
by Sequential Eorward Selection for reducing the di¬ 
mensionality and nearest neighbor classification for in¬ 
fant pain detection. On the same database, 2010 relied 
on relevance vector machine (RVM) applied directly 
on manually selected infant faces for improved binary 
pain detection. 2012 used Local Binary Pattern (LBP) 
and its extension for improved face description and ac¬ 
curacy. We note that the COPE database, containing 
204 images of 26 neonates is rather limited in extent 
and it is marked with only binary annotations (i.e. 
pain and no-pain). 

2013 fused data acquired from multiple sources and 
information from a head pose estimator to detect the 
triggering level and the maximum level of pain sup- 
portability, while testing on the BioVid Heat Pain 
database. One of their contributions was to show that 
various persons have highly different levels of pain trig¬ 
gers and of supportability levels, thus arguing for pain 
assessment with multiple grades in order to accommo¬ 
date personal pain profiles. At the moment of writing 
this paper, the database is not public yet. 

The pain recognition from facial expressions was re¬ 
ferred in the work of 2007, who applied a previously 
developed AU detector complemented by Gabor fil¬ 
ters, AdaBoost and Support Vector Machines (SVM) 
to separate fake versus genuine cases of pain; their 
work is based on AUs, thus anticipating the more re¬ 
cent proposals built in conjunction with the UNBC 
McMaster Pain Database. 


Thus, due to its size and the fact that it was made 
public with expert annotation, the UNBC McMaster 
Pain Database is currently the factum dataset for fa¬ 
cial based pain estimation. In this direction, 2012 used 
Active Appearance Models (AAM) to track and align 
the faces on manually labelled key-frames and further 
fed them to a SVM for frame-level classification. A 
frame is labelled as “with pain” if any of the pain 
related AUs found earlier by 2008 to be relevant is 
present (i.e. pain score higher than 0). 2013 trans¬ 
ferred information from other patients to the current 
patient, within the UNBC database, in order to en¬ 
hance the pain classification accuracy over Local Bi¬ 
nary Pattern (LBP) features and AAM landmarks pro¬ 
vided by 2012. 2013 introduced an approach based on 
Kernel Mean Matching named Selective Transfer Ma¬ 
chine (STM) and trained for person-specific AU detec¬ 
tion, that is further tested on pain detection. 2014 and 
2014 trained a person specific classifiers augmented 
with transductive parameter transfer for expression 
detection with applicability in pain. 

We note that all these methods focus on binary detec¬ 
tion (i.e. pain/no pain) thus experimenting only with 
the first level of potential applications. Eurthermore, 
pain (i.e. true case) appears if at least one of the AU 
from eq. (1) is present, case which happens in other 
expressions too. Eor instance, AU 9 and 10 are also 
associated with disgust (Lucey et ah, 2010). Another 
corner case is related to the binary AU 43 which sig¬ 
nals the blink; obviously not all blinks are related to 
pain and the annotation of the UNBC database ac¬ 
knowledges this fact. 

Multi-level pain intensity is estimated by the meth¬ 
ods proposed in (Kaltwang et ah, 2012) and (Rudovic 
et ah, 2013). 2012 jointly used LBP, Discrete Cosine 
Transform (DCT) and AAM landmarks in order to es¬ 
timate the pain intensity either via AU or directly at 
a sequence level processing. 2013 introduced a Con¬ 
ditional Random Eield that is further particularized 
for the person and, for the expression dynamics and 
timing so to obtain increased accuracy. 

Given the mentioned possible confusion between pain 
and other expressions and, respectively, the explicit 
findings from (Werner et ah, 2013) regarding person 
dependent pain variability and the implicit assump¬ 
tion from the pain scales, which use multiple degree 
for pain intensity, our work focuses on pain intensity 
estimation. A byproduct will be pain detection. 

We propose a method working in a typical pattern 
recognition framework. Given a face image and its fa¬ 
cial landmarks, out method will identify the regions 
of interest, that are further described by Histogram of 
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Topographical features. The important dimensions of 
the face description are selected by a self-taught learn¬ 
ing process that is followed by actual pain assessment 
via a machine learning procedure. An overview of the 
proposed method is presented in figure 1. 

1.2. Paper Organization 

The remainder of the paper is structured as follows: 
in section 2 we present the used databases. In section 
3 we review state of the art feature descriptors and in¬ 
troduce the here proposed Histogram of Topographical 
features. The procedure chosen for transfer learning, 
as well as discussing alternatives, is presented in sec¬ 
tion 4. The system for still, independent, image-based 
pain estimation is presented in section 5; we follow by 
the description of the temporal filtering of video se¬ 
quences. Implementation details and results are de¬ 
tailed in section 6. The paper ends with discussions 
and conclusions. 

2. Databases 

As mentioned, to our best knowledge, there exist three 
databases with pain annotations. The COPE (Brah- 
nam et ah, 2007) is rather small and with binary pain 
annotations, while the Bio-Heat-Vid (Werner et ah, 
2013) is to be made public. The UNBC-McMaster 
Pain Database provides intensity pain annotations for 
more than 48000 images. 

2.1. Pain Database 

We test the proposed system over the publicly 
available UNBC-McMaster Shoulder Pain Expression 
Archive Database (Lucey et ah, 2011). This database 
contains face videos of patients suffering from shoulder 
pain as they perform motion tests of their arms. The 
movement is either voluntary, or the subject’s arm is 
moved by the physiotherapist. Only one of the arms is 
affected by pain, but movements of the other arm are 
recorded as well, to form a control set. The database 
contains 200 sequences of 25 subjects, totalling 48,398 
frames. One of the subjects lacks pain annotations 
and, thus, it will be excluded from testing/training. 
Examples of pain faces proving the variability of ex¬ 
pressions is showed in figure 2. 

The Prkachin - Solomon score for pain intensity is pro¬ 
vided by the database creators, therefore acting as a 
ground-truth for the estimation process. While in our 
work we do not focus on computing separately the 
AUs, yet eq. (1) explicitly confirms that databases 
build for AU recognition are relevant for the pain in¬ 
tensity estimation. 


The training testing scheme is the same as in the cases 
of (Lucey et ah, 2011) or (Kaltwang et ah, 2012): leave 
one person out cross validation; our choice is further 
motivated in section 6. 

2.2. Non Pain Database 

Noting the limited number of persons available within 
the UNBC database (i.e. only 23 for the training 
phase), we extend the data utilized for learning with 
additional examples from a non-pain specific database, 
more precisely, the Cohn-Kanade database (Kanade 
et ah, 2000). This contains 486 sequences from 97 
persons and each sequence begins with a neutral ex¬ 
pression and proceeds to a peak expression. The peak 
expression for each sequence is coded in the FACS sys¬ 
tem thus having the AU annotated. Relevant pairs 
of neutral/expression from the Cohn-Kanade database 
may be followed in figure 3. 

3. Histogram of Topographical Features 

To extract the facial deformation due to expression, 
we introduce a novel local/global descriptor, namely 
the Histogram of Topographical (HoT) features. To 
proper place it in a context, we will start by reviewing 
the most important image descriptors. 

3.1. Global/Local Image Descriptors - State of 
the Art. 

Many types of local image descriptors are used across 
the plethora of computer vision applications (Tuyte- 
laars & Mikolajczyk, 2008). The majority of the solu¬ 
tions computed in the image support domain^ are ap¬ 
proachable within the framework of the Taylor series 
expansion of the image function, namely with respect 
to the order of the derivative used. 

Considering the zero-order coefficient of the Taylor se¬ 
ries, i.e. the image values themselves, one of the most 
popular descriptors is the histogram of image values 
and, respectively the data directly, which was em¬ 
ployed for instance in AAM (Cootes et ah, 2001) to 
complement the landmarks shape. Next, relying on 
the first derivative (i.e. the directional gradient), sev¬ 
eral histogram related descriptors such as HOG (Dalai 
& Triggs, 2005) or SIFT (Lowe, 2004) gained popular¬ 
ity. 

The second-order image derivative (i.e. the Hessian 
matrix) is stable with respect to image intensity and 

^Here, alternatively to the image domain we assume the 
spectral domains where popular descriptors such as DCT 
or wavelet coefficients are defined. 
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Figure 1. The schematic of the proposed continuous pain estimation method. 



Figure 2. Face crops from UNBC-McMaster Shoulder Pain Expression Archive Database (Lucey et ah, 2011). The top two 
rows illustrate the variability of pain faces while the bottom row illustrates non-pain cases. Note the similarity between 
the two situations. 



Figure 3. Face crops pairs (neutral - top row and respectively with expression bottom row) from the Cohn-Kanade 
database. 
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scale and was part of SIFT (Lowe, 2004) and SURF 
(Bay et al., 2008) image key-point detectors. 2007 
used the dominant eigenvalue of the Hessian matrix 
to describe the regions in terms of principal curvature, 
while 1998 deployed a hard classification of the Hessian 
eigenvalues in each pixel (thus identifying the degree 
of local curviness) to describe tubular structures (e.g. 
blood vessels) in medical images. 

Summarizing, we stress that all the mentioned state 
of the art systems rely on information gathered form 
a single Taylor coefficient of either order zero, one or 
two in order to describe images globally, or locally. 

The approximation of the image in terms of the first 
two Taylor series coefficients is the foundation of the 
topographical primal sketch introduced by 1983 which 
is inspired by the prior 1980 Laplacian based sketch 
representation. The primal sketch was further adopted 
for face description by 2007. In the primal sketch, the 
description of the image is limited to a maximum num¬ 
ber of 12 (or 16) classes which correspond to the basic 
topographical elements. Further extension lays in the 
work of 2009, who plied the Hessian for locating key- 
points and described their vicinity with the histogram 
of color values (order zero) and with the histogram of 
oriented gradients (order one). 2006 developed both 
the first and second derivative blob measures for an 
approach derived from primal sketch features in terms 
of scale-invariant edge and ridge features; yet they fo¬ 
cus only on interest point and use different measures 
than our proposal. 

In parallel to our work, 2014 proposes four strength 
measures extracted by similarity with second order 
moment based Harris and Shi-Tomasi operator (Shi 
& Tomasi, 1994), but from the Hessian’s eigenvalues, 
that can be used to identify interest points. 

We consider that all pixels from a region of interest 
carry important topographic information which can 
be gathered in orientation histograms or in normalized 
magnitude histograms. In certain cases, only a com¬ 
bination of these may prove to be informative enough 
for a complete description of images. 

3.2. Feature Computation 

In a seminal work, 1983 introduced the so-called to¬ 
pographical primal sketch. The gray-scale image is 
considered as a function / : —)■ M. Given such 

a function, its approximation in any location (i,j) is 
done using the second-order Taylor series expansion: 


I{i + Ai, j -|- Aj ) « 


^{hj) + ■ [^i) + 

i [ A, Aj ]n{i,j) 


A* ■ 


( 2 ) 

where V/ is the two-dimensional gradient and 
is the Hessian matrix. 


Eq. (2) states that a surface is composed by a con¬ 
tinuous component and some local variation. A first 
order expansion uses only the V/ term (the inclina¬ 
tion amplitude) to detail the ’’local variation”, while 
the second order expansion (i.e. the Hessian), 
complements with information about the curvature of 
the local surface. Considering the gradient and Hes¬ 
sian eigenvalues, a region could be classified into sev¬ 
eral primal topographical features. This implies a hard 
classification and carries a limitation burden as it is 
not able to distinguish, for instance, between a deep 
and a shallow pit. We further propose a smoother and 
more adaptive feature set by considering the normal¬ 
ized local histograms extracted from the magnitude of 
Hessian eigenvalues, the eigenvectors orientation and, 
respectively, the magnitude and the orientation of the 
gradient. 

1998 employed the concepts of linear scale space theory 
(lijima, 1962), (Floracket al., 1992), (Lindeberg, 1994) 
to elegantly compute the image derivatives. Here, the 
image space is replaced by the scale space of an image 
L{i,j,a): 


where * stands for convolutions and G(i,j,a) is a 
Gaussian rotationally symmetric kernel with variance 
cr^ (the scale parameter): 


= ( 4 ) 

The differentiation is computed by a convolution with 
the derivative of the Gaussian kernel: 

^i(bi,o-) = ■ —G{i,j,a) (5) 


In the scale space, the Hessian matrix T-L{i,j, a) at lo¬ 
cation (z, j) and scale a is defined as: 


n{i,j,a) 




( 6 ) 


where is the convolution of the Gaussian 

second order derivative ^G(z,j, ct) with the image 




Pain Intensity Estimation by a Self-Taught Selection of Histograms of Topographical Features 


I at location and similarly for Lij(i,j,a) = 

Lji{i,j,a) and Ljj(i,j,a). Further analysis requires 
the computation of the eigenvalues and eigenvectors 
of the Hessian matrix. 

The switch from the initial image space to the scale 
space, not only simplifies the calculus, but the implicit 
smoothing reduces the noise influence over the topo¬ 
graphic representation, influence that was signaled as 
a weak point from inception by 1983. 

The decomposition of the Hessian in eigenvalue repre¬ 
sentation acquiesce the principal directions in which 
the local second order structure of the image can 
be decomposed. The second order hints to the 
surface curvature and, thus, to the direction of 
the largest/smallest bending. We will denote the 
two eigenvalues of the Hessian matrix by 

Ai(i,j, cr) < The eigenvector correspond¬ 

ing to the largest eigenvalue is oriented in the direc¬ 
tion of the largest local curvature; this direction of the 
principal curvature is denoted by d\{i,j,a). A visual 
example with gradient and curvature images of a face 
is shown in figure 4. 

3.3. Descriptors for Regions of Interest 

In the remainder of the work, for each region of interest 
H, the following HoT descriptors will be used: 

• Second order data (Hessian): 

— The histogram of hard voting of image sur¬ 
face curvature orientation. For each pixel 
in H, “1” is added to the orientation of the 
ridge/valley extracted by computing the an¬ 
gle of the first Hessian eigenvector, if the 
second eigenvalue is larger than a threshold, 
A 2 > 7a- 

H^m) = 

ir == [6*]) • (A2(bj) > Tx) 

(7) 

— The histogram of soft voting ridge orientation 
adds, instead of “1”, the difference between 
the absolute values of the Hessian eigenval¬ 
ues. 

^ E(*.j)en(^A(b j) == [0]) • (A2(b j) - Ai(i, 

( 8 ) 

The and H 2 histograms produce, each, 
a vector of length equal to the number of 
orientation bins and describe the curvature 
strength in the image pixels. 


— The range-histogram of the smallest eigen¬ 
value, given a predefined range interval (e.g. 
[0 ,Ma2 =30]). 

Hi^ik) = 

iE(,,)eo(A2(*,j)e[(fc-l)f)^; k^]) 

(9) 

Inspired from the Shi-Tomasi operator (Shi & 
Tomasi, 1994), Lindeberg (Lindeberg, 2014) 
proposed to scan in that region the smaller 
Hessian eigenvalues and select the maximum 
of them as a measure of that region inter¬ 
est points. We differ by considering that not 
only the extremum of the minimum eigen¬ 
values matters, but we gather all data in a 
histogram to have the region’s global repre¬ 
sentation. 

— The range-histogram of the differences be¬ 
tween the eigenvalues given a predefined dif¬ 
ferences range interval (e.g. [0, Mai 2 = 50]. 

Hiik) = ^Epj)ef2 ((^i(bj) - A2(i,j)) e 

( 10 ) 

• First order data (gradient): 

— Histogram of orientation, (Dalai & 

Triggs, 2005); each pixel having a gradient 
larger than a threshold, Tq casts one vote; 

— Histogram of gradient magnitude, The 

magnitudes are between 0 and a maximum 
value (100). 

The constants Zi,..., Z 4 ensure that each histogram 
is normalized. Experimentally chosen values for the 
thresholds are: T\ = O.I and Tq = 5. Each of the 
histograms is computed on 8 bins. 

4. Modified Self-Taught Spectral 
Regression 

The target database of the proposed system, UNBC, 
is highly extensive as number of frames, but is also 
rather limited with respect to the number of persons 
(only 25) and to inter-person similarity. This is a typ- 
jical trait of the medical-oriented image databases as 
there are not so many ill persons to be recorded. To 
increase the robustness of the proposed algorithm, a 
new mechanism for transfer learning is proposed. 

We have inspired our work from the “self-taught learn¬ 
ing” paradigm (Raina et ah, 2007) which is conceptu- 
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Figure 4. Computing the HoT features for a face: (a) Original face image, (b) The image represented as a surface, (c) 
Gradient orientation image, (d) Gradient magnitude image, (e) Curvature orientation image and (f) Curvature strength 
image. 


ally similar to the inductive transfer learning (Jialin- 
Pan & Yang, 2010). A source database, described by 
the unlabelled data G M” is used 

to learn the underlying data structure so to enhance 
the classification over the labelled data of the target 
database: | (x|^^ y (i)); (xf ^; j/(2 )).; j/(™)) |, 

where x is the data and y are labels. According 
to (Raina et ah, 2007), the data structure could be 
learned by solving the following optimization problem: 


minimize\y^a 



E,afb,|li+/3||a«||i]; 
s.t.||bj||2 < l,Vj 
( 11 ) 


The minimization problem from eq. (11) may be in¬ 
terpreted as a generalization of the Principal Compo¬ 
nent Analysis concept^ as it optimizes an overall rep¬ 
resentation, with the purpose of identifying the best 
minimum set of linear projections. The PC A aims 
to decompose the original data into a low-rank data 
and a small perturbation in contrast with Robust PCA 
(Candes et ah, 2011) which decomposes the data into 
a low-rank sparse matrix. 


Taking into account that the interest is in classifi- 

^PCA is retrieved by solving minimizeh,a JO; ||xi'^ — 
s.t. ||bjj |2 = 1 and 6i,... 6 t - orthogonal. 


cation/regression, we consider that: 1. the source 
database should be relevant to the classihcation task 
over the target database; 2. original features should 
form relevant clusters such that, 3. the optimization 
over the source database preserves local grouping. A 
modality to preserve the original data clustering is to 
compute the Locality Preserving Indexing with the 
similarity matrix W using on the cosine distance: 


W^j = l Ibcifrai ^ ^ 

I 0 otherwise 

( 12 ) 

We replaced the cosine distance used in (Florea et ah, 
2014) with the heat kernel, as in the case of Local¬ 
ity Preserving Projection (He & Niyogi, 2003) with a 
further adaptation to our problem: 


W, 




ke 2^2 jf Xj G Np{xj) V Xj G iVp(xi) 
0 otherwise 

(13) 


where Np{xi) contains the p = 8 closest neighbors of 
Xi and k = 1 if Xi contains at least one of the action 
units from eq. (1). The optimization runs over the 
similarity matrix, such that we solved the following 
regularized least squares problem over the unlabelled 
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source database: 


minimize-B=[\:y^...bT] - uj^ + a||bj|j^^ ; 

i = 1,... ,k 
(14) 

where is the j-th element of the eigenvector of 
the symmetrical similarity matrix W. This process of 
extracting the data representation (eq. (13) - if re¬ 
moved the adaptation to our problem and (14)) form 
the so called spectral regression introduced by 2007. 
A similar transfer learning method was proposed by 
2012, with two core differences: data similarity is com¬ 
puted using a hard assignment compared to the soft 
approach from eq. (13) and unsupervised clustering 
was performed on the target database. 

Finally, the labelled new data is obtained by classifi- 

(i) 

cation of the projected vectors z) , determined as: 

= Bx|*\ Vi = 1,... m (15) 

where B = [bi ... b^]. 

In our algorithm, the neutral image and respec¬ 
tively the images with the apex emotion from Cohn- 
Kanade database were the unlabelled data from the 
source database, while the UNBC was the target, la¬ 
belled, database. The transfer learning process and 
the projection equation, (15), were applied indepen¬ 
dently on the Hessian based histograms, [Hi ,... H^] 
and, respectively, on the gradient based histograms 
[H?,H§]. 

The transfer learning includes also a dimensionality 
reduction (i.e. feature selection procedure). The full 
HoT feature has 240 dimensions, while there are 7937 
images with Prkachin-Solomon score higher than 0. 
Taking into account that part are utilized for train¬ 
ing, feature selection is required to prevent the classi¬ 
fier from falling into the curse of dimensionality. The 
Hessian based histogram are reduced to Th = 32 di¬ 
mensions while gradient ones to Tq = 24. 

5. System 

5.1. Still image pain estimation 

The schematic of the proposed system for pain in¬ 
tensity assessment in independent, still images is pre¬ 
sented in figure 5 (b). The procedure for HoT features 
extraction is presented in figure 5 (a). 


5.2. Landmark localization and annotations 

The UNBC landmarks are accurate (Lucey et ah, 
2011), yet their information is insufhcient to provide 
robust pain estimation. In this sense, 2012 reported 
that using only points, for direct pain intensity esti¬ 
mation, a mean square error of 2.592 and a correlation 
coefficient of 0.363 is achieved (as also shown in table 
!)• 

Due to the specific nature of the AUs contributing to 
pain, and based on the 22 landmarks, we have selected 
5 areas of interest, showed in figure 5 (a), as carrying 
potentially useful data for pain intensity estimation. 

Due to the variability of the encountered head poses, 
we started by roughly normalizing the images: we 
ensured that the eyes were horizontal and the inter¬ 
ocular distance was always the same (i.e. 50). Out- 
of-plane rotation was not dealt with explicitly, but 
implicitly by the use of the histograms as features. 
Since the 8 histogram bins span 360 degrees, the head 
robustness is up to 22.5°. 

5.3. Temporal Filtering 

In our previous work on the topic (Florea et ah, 2014) 
and also in (Kaltwang et ah, 2012) it was acquainted 
that while marked by equation (1), the blink does not 
always signal pain. Unfortunately, the blink is sufh- 
ciently obvious such that an automatic system con¬ 
cludes that it is pain. The main difference between 
the blink and the pain face is duration: blinks typ¬ 
ically take less than 15 frames, while pain faces are 
longer. To further differentiate between those two, we 
consider three versions of temporal filtering of the se¬ 
quences. 

The first solution is a simple filtering aimed at reducing 
the noise. Here we started with a median filter on 
a vicinity of width w followed by a linear regression 
(LR), over the same window to estimate the current 
value. The preferred window size is ru = 21. 

The second and third solutions rely on machine learn¬ 
ing approaches, where given the data from the vicinity 
of the current pixel, a classifier attempts to estimate a 
better value. 

The difference between the two considered solutions 
lies in data description: 

• The feature vector is formed by the pain estimates 
for the frames in the vicinity w taken from the se¬ 
quence. One expects that given a large enough 
window size w, the classifier will learn to skip the 
blink. Here the only classifier that produces good 
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Figure 5. (a) The features extraction procedure, (b) The transfer knowledge system. Data internal representation is 
computed on unlabelled data from Cohn-Kanade database to make use of the larger number of persons. The reduced 
data is fitted in order to predict pain intensity. 


results was a MLP with two hidden layers (of 40 
neurons each) and single output. The MLP may 
be clearly seen as a generalization over the lin¬ 
ear regression, in the sense that different feature 
dimensions contribute with different weights. 


the classifier to distinguish between blinks relevant to 
pain and those which are not relevant. Further more, 
we determined that still pain estimation produces pat¬ 
terns of estimates in pain onset and offset and we aim 
to improve the performance in such cases. 


• The feature is obtained by considering statisti¬ 
cal moments computed on increasing vicinities of 
still image pain estimates. With such descrip¬ 
tion, we estimate that typically patterns that erro¬ 
neously appear in the estimated data are learned 
and skipped in the testing. Here, the feature of 
the frame i is : 


m 





(16) 


where Zi is the pain estimate for the frame i, while 
(7^™ is the variance of the pain estimates over a 
centered window in i having the width w. This 
description is inspired from the total strict pixel 
ordering (Coltuc & Bolon, 1999), (Florea et ah, 
2007). Again, we empirically found the best value 
for window size to be lu = 61. 

In this case a SVR leads to better correlation, 
while the 2-hidden layers MLP shows smaller 
mean square error. 


The main idea behind these solutions is to gather data 
from vicinities larger than blink duration and to allow 


6. Results and Experiments 

6.1. Objective Metrics 

To objectively evaluate the performance of the pro¬ 
posed approach for the task of continuous pain inten¬ 
sity estimation according to the Prkachin-Solomon for¬ 
mula, several metrics are at hand. The mean squared 
error (e^) and the Pearson correlation coefficient (p) 
between the predicted pain intensity and ground truth 
pain intensity are used for continuous pain intensity 
accuracy appraisal. These measures were also used 
by (Kaltwang et ah, 2012), thus direct comparison is 
st raight- for ward. 

For the pain detection all frames with Prkachin- 
Solomon higher than zero are considered with pain 
and the measure adopted is Area Under ROC curved 
(AUC). While we argue against the relevance of this 
method for the pain estimation in assistive computer 
vision, yet the measure is relevant to evaluate the the¬ 
oretical performance of a face analysis method. The 
AUC was used also by several other works (Lucey 
et ah, 2012), (Chen et ah, 2013), (Chu et ah, 2013), 
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(Zen et al., 2014), (Sangineto et al., 2014), and it facil¬ 
itates direct comparison with state of the art solutions. 

6.2. Testing and Training 

The used training-testing scheme, for both still and 
sequence related pain estimation is the leave-one- 
person-out cross-validation. The same scenario is em¬ 
ployed in previous works on the topic (Lucey et ah, 
2012), (Chen et ah, 2013), (Chu et ah, 2013), (Zen 
et ah, 2014), (Sangineto et ah, 2014), (Kaltwang et ah, 
2012): at a time, data from 23 persons is used for train¬ 
ing and from the 1 person for testing. 

Furthermore, a scenario where testing and training 
datasets are disjoint with respect to the person is mo¬ 
tivated by use-cases for emergency units and critically 
ill persons where it is not possible to have neutral (i.e. 
without pain) images for the incoming patients. Thus, 
we consider that image oriented k-fold scenarios, (e.g. 
in (Rudovic et al., 2013)) are more theoretically ori¬ 
ented than practically. 

As the number of images with positive examples (with 
a specihc AU or with Pain label) is much lower than 
the one containing negative data, for the actual train¬ 
ing the two sets were made even; the chosen negative 
examples were randomly selected. To increase the ro¬ 
bustness of the system, three classifiers were trained in 
parallel with independently drawn examples and the 
system output was taken as the average of the classi¬ 
fiers. 

For the actual discrimination of the pain intensity, we 
plied the same model as in the case of similar works, 
(Lucey et al., 2012), (Kaltwang et al., 2012). We used 
two levels of classifiers (late fusion scheme): first, each 
category of features was fed into the set of three Sup¬ 
port Vector Regressors (SVR) (with radial basis kernel 
function, cost 4 and F = 2“^-®). Landmarks were not 
spectrally regressed (i.e were not re-represented with 
eq. (15) ) but directly passed to the SVRs. The results 
were fused together within a second level of boosted 
ensemble of four SVRs. The implementation of the 
SVR is based on LibSVM (Chang & Lin, 2011). 

6.3. Pain Estimation - Results 

The preferred implementation was by direct estima¬ 
tion of Prkachin - Solomon score of pain. Alternatively, 
one may consider as intermediate step the AU estima¬ 
tion, followed by pain prediction using equation (1); 
yet previous research (Kaltwang et ah, 2012), (Florea 
et ah, 2007) showed that this method produces weaker 
results since errors are cumulated. 


based Prkachin-Solomon pain score estimation pro¬ 
duces a correlation coefficient of p = 0.551, and a mean 
square error of = 1.187. The area under curve is 
AUC = 80.9. The best temporal filtering increased 
the correlation to p = 0.562 and decreased the mean 
square error to = 0.885. Best AUC achieved was 
AUC = 82.1. The next subsection will further detail 
these results and their implications. 

Given a new UNBC image and the relevant landmarks 
positions, the query to determine the pain intensity 
for that image takes approximately 0.15 seconds on a 
single thread Matlab implementation on an Intel Xeon 
at 3.3 GHz. Temporal filtering adds a delay due to the 
consideration of a temporal window around the current 
frame; this window is larger then a blink (which has a 
typical duration of 300-400 milliseconds) and it adds 
a delay of « 1 second. 

6.4. Experiments 

6.4.1. Are the HoT Features Really Useful? 

First, we investigate the capabilities of the HoT fea¬ 
tures by considering the following example: we take 
the Hrst frontal image without pain for each person 
and consider its HoT features as reference; next, we 
compute the HoT features of all the images with a pain 
intensity higher than 4 and of all the images without 
pain for each person separately. We plot the sum of 
absolute differences between the set considered as ref¬ 
erence to the mentioned images with and without pain 
respectively. The results are presented in figure 6. Ide¬ 
ally, large values are aimed in the left plot and zeros in 
the right one. We note that, for this particular exam¬ 
ple, the largest contribution in discriminating between 
pain and no-pain cases was due to Hessian based 
and H 2 histograms. Gradient based histograms lead 
to inconclusive differences in the case of intense pain, 
while and produced large values also for the 
no-pain case. 

Furthermore, if considering the first 3 dimensions as 
selected by the transferred SR-M, the Hrst 4000 no¬ 
pain and all the intense pain (i.e. higher than 4) cases 
are clustered as shown in figure 7. The clusters in the 
Hessian based space are fairly visible suggesting that: 
(1) HoT features are more powerful if they include 
Hessian based data, while addressing the pain problem 
and (2) identification of high pain is doable. Yet, we 
did not plot the data corresponding to low levels of 
pain which fills the intermediate space and, in fact, 
makes the discrimination difficult. 


The best performing method for individual image 
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Figure 6. Sum of absolute differences when comparing all images without pain and respectively with intense pain to a 
chosen no-pain reference image. Ideally, we aim for large values in the left plot and zeros in the right one. Ai refers to 
the first area of interest (i.e. around the left eye), A 2 to the second one (around right eye), etc. 


6.4.2. Feature Contribution 

To appraise the overall contribution of each histogram 
type to the facial based pain intensity estimation we 
present the results in table 1. To have a reference with 
respect to state of the art features, we fill in with re¬ 
sults achieved for the same problem by Kaltwang et 
al. (Kaltwang et ah, 2012). As one may see, if taken 
individually the proposed histograms under-perform 
state of the art features. Yet different categories com¬ 
plement each other well and by combining them we 
obtain improved results. 

To detail the contribution of each histogram type, as 
defined in section 3, we remove one type of histogram 
and see the overall effect over the pain score. In table 2 
we report the achieved relative accuracy obtained with 
only part of histogram types. The decrease is larger 
for the more important types. Landmarks are skipped 
for this experiment. As one can see, all the histograms 
contribute positively. 

6.4.3. Feature Selection and Transfer 
Learning 

In table 3 we present the overall performance when 
various possibilities of transfer learning are considered. 
The internal data representation may be perceived as 
unsupervised feature selection. In this sense, beyond 
the proposed modified Spectral Regression (SR-M), 
we tested the standard Spectral Regression (SR) (Cai 
et ah, 2007) and the Locality Preserving Projection 
(LPP) (He & Niyogi, 2003) as it is the inspiration for 
SR. We also tested the standard Principal Component 


Table 4. Comparison of the achieved accuracy of pain in¬ 
tensity estimation when feature selection is learned on the 
Cohn-Kanade database (i.e. self-taught learning) or di¬ 
rectly on the UNBC database (i.e. no transfer). 


Database 
for learning 

Cohn - Kanade 

UNBC 

Feature 

SR-M PPCA 

SR-M PPCA 

Measure 

Mean Square Error 


1.187 1.173 

1.203 1.181 

Measure 

Correlation, p 


0.551 0.545 

0.532 0.532 


Analysis as being the foremost dimensionality reduc¬ 
tion method and its derivation through Expectation- 
Maximization, namely Probabilistic PCA (PPCA), 
(Tipping & Bishop, 1999); further, we included the 
Factor Analysis (FA) as it is a generalization of PCA 
and a more recent derivation of the PCA: the Ro¬ 
bust PCA (RPCA) based on Mixture of Gaussian for 
the noise model (RPCA-MOG) (Zhao et ah, 2014), 
which is an improvement over the standard RPCA in¬ 
troduced by Candes et al. (Candes et ah, 2011) that 
uses Principal Component Pursuit to find a unique so¬ 
lution. 

Other considered alternatives are to perform no trans¬ 
fer at all, or to extract the inner data representation 
directly from the labelled UNBC database. The com¬ 
parative results for these cases are presented in table 
4. The results show that specifically relying on the 
adapted similarity measure (SR-M) and taking into 
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Figure 7. Data clustering for Hessian based histograms (left) and respectively gradient based histograms (right), in each 
case the first three axes are retained. With red are frames with high pain intensity and with blue are the first 4000 no-pain 
images. As one can see, the data is fairly clustered for Hessian based features and less for gradient based ones. 


Table 1. Accuracy of pain intensity estimation using the Prkachin - Solomon formula. We report the achieved results for 
various versions of features used: containing only Hessian based histograms (H/^ - Hess), only gradient based histograms 
{Hf - Grad) and both of them to form the so called Histogram of Topographical (HoT = Grad-|-Hess) features; the 
complete version contains landmarks (marked as PTS) and HoT. The relevant features were in each case learned with 
the modified version of Spectral Regression (SR-M) on the Gohn-Kanade database (CK) via self-taught learning. The 
Prkachin - Solomon score is estimated directly by the classifiers which were trained accordingly. 


Work 

Proposed 

(Kaltwang et ah, 2012) 

Feature 

Hess 

Grad 

HoT 

HoT-bPTS 

PTS DCT 

LBP 

Measure 

Mean Square Error 


3.76 

4.67 

3.35 

1.187 

2.592 

1.712 

1.812 

Measure 

Correlation, p 


0.252 

0.341 

0.417 

0.551 

0.363 

0.528 

0.483 


account a larger number of persons, the discrimina¬ 
tion capability increases. 

A numerical comparison between our modified version 
of spectral regression and the probabilistic PCA, in 
transfer, shows little difference. Yet, we argue for the 
superiority of our method based on analysis of the con¬ 
tinuous pain intensity signals: the major difference is 
that our method shows a bias towards blink and consis¬ 
tent results, the reduction based on PCA simply fails 
in some situations without being able to make any cor¬ 
relation between them. A typical case is illustrated in 
figure 8. 

6.4.4. Comparison with State of the Art for 
THE Transfer Learning Procedure 

To give a quantitative comparison of the performance 
of the proposed self-taught learning method, we note 
that multiple methods report transfer learning en¬ 


hanced performances on the UNBC McMaster Pain 
database. All of them applied the same evaluation 
procedure. 

2013, 2014 and 2014 used histograms of LBP followed 
by PCA reduction of dimensionality and various clas¬ 
sification methods, by directly applying it to training 
data or by relying on transductive transfer learning; 
2013 report results for AdaBoost and Transductive 
Transfer AdaBoost (TTA); 2013 for Selective Trans¬ 
fer Machine (STM); 2014 report results for Trans¬ 
ductive Support Vector Machine (TSVM) and Sup¬ 
port Vector-based Transductive Parameter Transfer 
(SVTPT) (Zen et ah, 2014); Sangineto et al. for 
Transductive Parameter Transfer with Density Esti¬ 
mate Kernel. As one may note, all the methods are 
transductive transfer learning (i.e. the source and tar¬ 
get tasks are the same, while the source and target 
domains are different) while our method is part of the 
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Table 2. Contribution of each of the histogram types used. We report the Pearson correlation coefficient, p when the 
mentioned type of histogram is removed. The reference is the right-most result (all histograms used). Thus, smaller is 
the value (i.e. larger is the decrease), higher is the contribution of the specific type of histogram. 


Histogram removed 







None -HoT 

Correlation, p 

0.331 

0.368 

0.355 

0.358 

0.351 

0.192 

0.417 


Table 3. Accuracy of pain intensity estimation achieved results when self-taught learning (i.e. feature selection was learned 
on the Cohn-Kanade database and used on UNBC) with dimensionality reduction method. Details are in text accordingly. 


Feature 

SR-M 

SR 

LPP 

PCA 

PPCA 

RPCA-MOG 

FA 

Measure 

Mean Square Error 


1.187 

1.183 

1.203 

1.181 

1.173 

3.891 

2.746 

Measure 

Correlation, p 


0.551 

0.545 

0.544 

0.541 

0.545 

0.522 

0.540 




Table 5. Comparison with state of the art transfer learning 
methods using the achieved Area Under Curve (AUC). The 
explanation for the acronyms is in text. 


Method 

AUC 

AdaBoost (Chen et ah, 2013) 

76.9 

TTA (Chen et ah, 2013) 

76.5 

TSVM (Zen et ah, 2014) 

69.8 

STM 

(Chu et ah, 2013) 

76.8 

TPT 

(Sangineto et ah, 2014) 

76.7 

SVTPT 

(Zen et ah, 2014) 

78.4 

Proposed 

80.9 


Figure 8. A wave form for continuous pain intensity esti¬ 
mation taken from person 1 of the database. The red line 
is the pain estimation using Spectral Regression while the 
blue is with PPCA. The modes on SR plot are much more 
visible. 


inductive transfer learning category (i.e. the target 
task is different from the source task, no matter when 
the source and target domains are the same or not 
(Jialin-Pan & Yang, 2010)). 

In table 5 we present the results reported by the men¬ 
tioned works comparatively to the performance of the 
proposed method. As one can see, our method reaches 
the best accuracy. 


6.5. Temporal Filtering 

The results achieved with the three methods of tempo¬ 
ral filtering, given the still image pain estimation are 
presented in table 6. 

While analyzing the results, all methods lead to im¬ 
proved mean square error and area under curve. Re¬ 
garding the correlation, from a quantitative point of 
view, the method based on linear regression (LR), 
which has the main purpose of removing the noise 
in the estimated values, performs the best. Yet this 
method is an incremental improvement of the still pain 
estimation. 

The other filtering solutions produce, in fact, mixt re¬ 
sults; while overall they indicate a decrease or a small 
increase of the correlation, in fact they boost the per¬ 
formance of results on half of the persons with more 
than 0.05, in average. The persons with increase are 
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Table 6. Comparison of the achieved accuracy of pain intensity estimation when the three methods for temporal filtering 
were included: based on linear regression (LR), when the vicinity was a feature of MLP and with strict ordering description. 


Method 

Still 

Temporal-LR 

Vicinity-MLP 

Strict ordering-MLP 

Strict ordering-SVM 

Measure 

Mean Square Error 


1.187 

0.885 

1.137 

1.200 

1.280 

Measure 

Correlation, p 


0.551 

0.562 

0.535 

0.529 

0.558 


the ones where the methods performed better than 
average (i.e. correlation was p > 5); here, the blinks 
were correctly removed and the temporal filtered sig¬ 
nal comes much closer to the ground truth. However, 
on persons with below average initial results, the filter¬ 
ing de-correlates even more the estimated values with 
respect to ground truth. These are persons that ex¬ 
hibit different pain faces, such as opening the mouth 
(e.g. the person from the last column in figure 2) or 
bowing the head to the low left. Concluding, if either 
more data is available for learning or if the system is 
further robustified with respect to the person, the ma¬ 
chine learning temporal filtering will be more useful; 
now it is a mere noise reduction does the work. 

6.6. Comparison with State of the Art. 

As mentioned in section 1.1, there exist several meth¬ 
ods reporting results on the UNBC Pain database. 
Yet, only 2012 and our previous work (Florea et ah, 
2014) tested on the entire database, with separation 
between persons when considering testing/training 
folds and reported continuous pain intensity. The work 
from (Kaltwang et ah, 2012) consists in trying several 
combination of feature coupled with a Relevance Vec¬ 
tor Machine (RVM- which is the SVM reinterpreted 
under Bayesian framework) and fused with a second 
layer of RVM; we present all of them to have a better 
comparison in the left hand table from figure 9. 

Mainly, the highest mean square error is obtained by 
the still image identification followed by temporal fil¬ 
tering outperforming the next competitor by near 0.3 
pain levels. In this category, it is followed by our pre¬ 
vious method (Florea et ah, 2014) and by the here 
proposed still image estimation. Regarding the corre¬ 
lation coefficient, our methods set ranks second after 
the combination of DCT with LPB fused by a RVM. 
Surprisingly, the direct combination of landmarks with 
features reported by (Kaltwang et ah, 2012) does not 
lead to very good results. 

Taking into account that there are different winners at 
different categories, to have a better image of relation 


between them, we plotted the results from table (a) 9 
as MSE vs p axis (see figure 9 (b) ). In such a plot, a 
perfect method will have MSE = 0 and p = 1 and it 
will be placed in the top left corner. As one can see, 
the proposed temporal method is closer to the perfect 
one’s position. 

7. Discussion and Conclusions 

In this paper we introduced the Histogram of Topo¬ 
graphic features to describe faces. The addition of 
Hessian based terms allowed separation of various face 
movements and, thus, of pain intensity levels. The ro¬ 
bustness of the system was further enhanced by a new 
transfer learning method which was inspired from the 
self-taught learning paradigm and relied on preserving 
the local similarity of the feature vectors as learned 
over a more consistent database in terms of persons, 
to ensure that relevant dimensions of the features are 
used in the subsequent classification process. 

Regarding the addition of the actual features, while 
their individual contribution was rather small when 
compared with consecrated features, they comple¬ 
mented each other well, as showed by the increase of 
the overall performance when all feature types were 
used. As showed in table 9 (a), this is not the case for 
features employed in previous solutions, which argues 
for the consideration of the complete topographical de¬ 
scription. 

The transfer learning from a database with larger num¬ 
ber of persons increased the system robustness. More 
precisely, the solution that did not use the transfer pro¬ 
cedure on some persons lead to better results, with the 
cost of providing smaller accuracy on others that are 
more different from the training faces. The transfer 
provided more consistent results overall, a fact which 
was proved by the entropy of the correlation coefficient 
increase from 9.10 to 9.37, enhancing the generaliza¬ 
tion with respect to person change. Such property 
is desirable taking into account the different charac¬ 
teristics of pain expressivity, trait which impedes the 
temporal filter to have an overall beneficial effect with 
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Method 

MSE, £2 

Correlation, p 

Proposed-Still 

1.187 

0.551 

Proposed-Temporal 

0.885 

0.562 

PTS-bDCT 

(Kaltwang et al., 2012) 

1.801 

0.489 

PTS-bLPB 

(Kaltwang et al., 2012) 

1.567 

0.485 

PTS-bDCT-kLPB 
(Kaltwang et al., 2012) 

1.386 

0.590 

HoT-kSR 

(Florea et al., 2014) 

1.183 

0.545 


(a) (b) 


Figure 9. (a) Numerical comparison of the achieved accuracy of pain intensity estimation with various state of the art 
methods. (b)Pearson correlation coefficient vs mean square error for the methods presented in the left-hand table. The 
perfect method is placed in the top left corner. 


results of noise reduction. Furthermore, the proposed 
transfer learning method performs better when com¬ 
pared with similar attempts but based on transductive 
transfer learning, as showed in table 5. 

The system provides indeed some failures. The AU 
43 (closing eyes), according to eq. (1), contributes 
to pain intensity, not all blinks are pain-related; the 
system, as in the case of (Kaltwang et ah, 2012), mis¬ 
takenly associate blinks of specific persons with pain. 
Other failures are in cases where the person’s method 
of expressing pain is rather different from most of the 
others; for instance, the second person widely opens 
the eyes, instead of closing them, leading the system 
to produce false negatives. Other errors are related to 
the fact that the person is speaking during the test; 
false positives are associated with persons bow (AU 
54) or jerk (AU 58) the head while feeling pain; yet the 
behavior is not general. Still, while the effort of the 
UNBC Pain Database creators was notable and made 
the foundation for advances on non-invasive pain es¬ 
timation from facial analysis, the database should be 
increased with more subjects to have illustration of 
variability in pain faces. 

7.1. Continuation Paths 

At the end we consider that further research on the 
topic is beneficial and we would like to emphasize sev¬ 
eral aspects, that in our opinion motivate such a neces¬ 
sity. First, the Prkachin - Solomon score was found to 
be only moderately strong correlated with self-report 
(i.e. a Pearson correlation coefficient of 0.66 or higher) 
(Hammal & Cohn, 2012) (Prkachin & P. Solomon, 
2008). Secondly, the self-report was found to be the 


more accurate mean for appraisal of the pain intensity 
(Shavit et ah, 2008). Thirdly, the observational scores 
that were found to be more reliable, such as the re¬ 
vised Adult Nonverbal Pain Scale (ANPS-R) and the 
Critical Care Pain Observation Tool (CPOT), contain 
additional indicators of pain such as the rigidness and 
the stiffness positions or restless and excessive activity; 
these are gestures recognizable by a system for analy¬ 
sis of the body posture. Concluding, additional data 
with annotation to inter-correlate the body posture 
estimation with facial pain assessment for facilitating 
further contribution on the topic of automatic pain as¬ 
sessment, will make possible a gradual evolution to a 
fully developed, autonomous system of assistive com¬ 
puter vision. 
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